Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
BMC Genomics ; 24(1): 266, 2023 May 18.
Article in English | MEDLINE | ID: covidwho-2321452

ABSTRACT

BACKGROUND: The prevalence of the COVID-19 disease in recent years and its widespread impact on mortality, as well as various aspects of life around the world, has made it important to study this disease and its viral cause. However, very long sequences of this virus increase the processing time, complexity of calculation, and memory consumption required by the available tools to compare and analyze the sequences. RESULTS: We present a new encoding method, named PC-mer, based on the k-mer and physic-chemical properties of nucleotides. This method minimizes the size of encoded data by around 2 k times compared to the classical k-mer based profiling method. Moreover, using PC-mer, we designed two tools: 1) a machine-learning-based classification tool for coronavirus family members with the ability to recive input sequences from the NCBI database, and 2) an alignment-free computational comparison tool for calculating dissimilarity scores between coronaviruses at the genus and species levels. CONCLUSIONS: PC-mer achieves 100% accuracy despite the use of very simple classification algorithms based on Machine Learning. Assuming dynamic programming-based pairwise alignment as the ground truth approach, we achieved a degree of convergence of more than 98% for coronavirus genus-level sequences and 93% for SARS-CoV-2 sequences using PC-mer in the alignment-free classification method. This outperformance of PC-mer suggests that it can serve as a replacement for alignment-based approaches in certain sequence analysis applications that rely on similarity/dissimilarity scores, such as searching sequences, comparing sequences, and certain types of phylogenetic analysis methods that are based on sequence comparison.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , Phylogeny , Sequence Analysis, DNA , Nucleotides/genetics , Base Sequence , Algorithms
2.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: covidwho-2188254

ABSTRACT

Messenger RNA-based therapeutics have shown tremendous potential, as demonstrated by the rapid development of messenger RNA based vaccines for COVID-19. Nevertheless, distribution of mRNA vaccines worldwide has been hampered by mRNA's inherent thermal instability due to in-line hydrolysis, a chemical degradation reaction. Therefore, predicting and understanding RNA degradation is a crucial and urgent task. Here we present RNAdegformer, an effective and interpretable model architecture that excels in predicting RNA degradation. RNAdegformer processes RNA sequences with self-attention and convolutions, two deep learning techniques that have proved dominant in the fields of computer vision and natural language processing, while utilizing biophysical features of RNA. We demonstrate that RNAdegformer outperforms previous best methods at predicting degradation properties at nucleotide resolution for COVID-19 mRNA vaccines. RNAdegformer predictions also exhibit improved correlation with RNA in vitro half-life compared with previous best methods. Additionally, we showcase how direct visualization of self-attention maps assists informed decision-making. Further, our model reveals important features in determining mRNA degradation rates via leave-one-feature-out analysis.


Subject(s)
COVID-19 , Deep Learning , Humans , COVID-19 Vaccines , Nucleotides/genetics , COVID-19/genetics , RNA , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA Stability
3.
Molecules ; 27(19)2022 Sep 28.
Article in English | MEDLINE | ID: covidwho-2066278

ABSTRACT

In designing effective siRNAs for a specific mRNA target, it is critically important to have predictive models for the potency of siRNAs. None of the published methods characterized the chemical structures of individual nucleotides constituting a siRNA molecule; therefore, they cannot predict the potency of gene silencing by chemically modified siRNAs (cm-siRNA). We propose a new approach that can predict the potency of gene silencing by cm-siRNAs, which characterizes each nucleotide (NT) using 12 BCUT cheminformatics descriptors describing its charge distribution, hydrophobic and polar properties. Thus, a 21-NT siRNA molecule is described by 252 descriptors resulting from concatenating all the BCUT values of its composing nucleotides. Partial Least Square is employed to develop statistical models. The Huesken data (2431 natural siRNA molecules) were used to perform model building and evaluation for natural siRNAs. Our results were comparable with or superior to those from Huesken's algorithm. The Bramsen dataset (48 cm-siRNAs) was used to build and test the models for cm-siRNAs. The predictive r2 of the resulting models reached 0.65 (or Pearson r values of 0.82). Thus, this new method can be used to successfully model gene silencing potency by both natural and chemically modified siRNA molecules.


Subject(s)
Cheminformatics , Gene Silencing , Nucleotides/genetics , RNA Interference , RNA, Messenger , RNA, Small Interfering/chemistry , RNA, Small Interfering/genetics
5.
Gene ; 835: 146641, 2022 Aug 15.
Article in English | MEDLINE | ID: covidwho-1885773

ABSTRACT

The subgenus Sarbecovirus includes two human viruses, SARS-CoV and SARS-CoV-2, respectively responsible for the SARS epidemic and COVID-19 pandemic, as well as many bat viruses and two pangolin viruses. Here, the synonymous nucleotide composition (SNC) of Sarbecovirus genomes was analysed by examining third codon-positions, dinucleotides, and degenerate codons. The results show evidence for the eight following groups: (i) SARS-CoV related coronaviruses (SCoVrC including many bat viruses from China), (ii) SARS-CoV-2 related coronaviruses (SCoV2rC; including five bat viruses from Cambodia, Thailand and Yunnan), (iii) pangolin sarbecoviruses, (iv) three bat sarbecoviruses showing evidence of recombination between SCoVrC and SCoV2rC genomes, (v) two highly divergent bat sarbecoviruses from Yunnan, (vi) the bat sarbecovirus from Japan, (vii) the bat sarbecovirus from Bulgaria, and (viii) the bat sarbecovirus from Kenya. All these groups can be diagnosed by specific nucleotide compositional features except the one concerned by recombination between SCoVrC and SCoV2rC. In particular, SCoV2rC genomes have less cytosines and more uracils at third codon-positions than other sarbecoviruses, whereas the genomes of pangolin sarbecoviruses show more adenines at third codon-positions. I suggest that taxonomic differences in the imbalanced nucleotide pools available in host cells during viral replication can explain the eight groups of SNC here detected among Sarbecovirus genomes. A related effect due to hibernating bats and their latitudinal distribution is also discussed. I conclude that the two independent host switches from Rhinolophus bats to pangolins resulted in convergent mutational constraints and that SARS-CoV-2 emerged directly from a horseshoe bat sarbecovirus.


Subject(s)
COVID-19 , Chiroptera , Severe acute respiratory syndrome-related coronavirus , Animals , China/epidemiology , Chiroptera/genetics , Genome, Viral , Humans , Nucleotides/genetics , Pandemics , Pangolins , Phylogeny , SARS-CoV-2/genetics
6.
Commun Biol ; 4(1): 698, 2021 06 03.
Article in English | MEDLINE | ID: covidwho-1260958

ABSTRACT

Given the global impact and severity of COVID-19, there is a pressing need for a better understanding of the SARS-CoV-2 genome and mutations. Multi-strain sequence alignments of coronaviruses (CoV) provide important information for interpreting the genome and its variation. We apply a comparative genomics method, ConsHMM, to the multi-strain alignments of CoV to annotate every base of the SARS-CoV-2 genome with conservation states based on sequence alignment patterns among CoV. The learned conservation states show distinct enrichment patterns for genes, protein domains, and other regions of interest. Certain states are strongly enriched or depleted of SARS-CoV-2 mutations, which can be used to predict potentially consequential mutations. We expect the conservation states to be a resource for interpreting the SARS-CoV-2 genome and mutations.


Subject(s)
COVID-19/virology , Genome, Viral , SARS-CoV-2/genetics , Animals , Base Sequence , Conserved Sequence , Evolution, Molecular , Genomics , Humans , Mutation , Nucleotides/genetics , Sequence Alignment
7.
Genomics ; 113(4): 2177-2188, 2021 07.
Article in English | MEDLINE | ID: covidwho-1233643

ABSTRACT

The prevailing COVID-19 pandemic has drawn the attention of the scientific community to study the evolutionary origin of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2). This study is a comprehensive quantitative analysis of the protein-coding sequences of seven human coronaviruses (HCoVs) to decipher the nucleotide sequence variability and codon usage patterns. It is essential to understand the survival ability of the viruses, their adaptation to hosts, and their evolution. The current analysis revealed a high abundance of the relative dinucleotide (odds ratio), GC and CT pairs in the first and last two codon positions, respectively, as well as a low abundance of the CG pair in the last two positions of the codon, which might be related to the evolution of the viruses. A remarkable level of variability of GC content in the third position of the codon among the seven coronaviruses was observed. Codons with high RSCU values are primarily from the aliphatic and hydroxyl amino acid groups, and codons with low RSCU values belong to the aliphatic, cyclic, positively charged, and sulfur-containing amino acid groups. In order to elucidate the evolutionary processes of the seven coronaviruses, a phylogenetic tree (dendrogram) was constructed based on the RSCU scores of the codons. The severe and mild categories CoVs were positioned in different clades. A comparative phylogenetic study with other coronaviruses depicted that SARS-CoV-2 is close to the CoV isolated from pangolins (Manis javanica, Pangolin-CoV) and cats (Felis catus, SARS(r)-CoV). Further analysis of the effective number of codon (ENC) usage bias showed a relatively higher bias for SARS-CoV and MERS-CoV compared to SARS-CoV-2. The ENC plot against GC3 suggested that the mutational bias might have a role in determining the codon usage variation among candidate viruses. A codon adaptability study on a few human host parasites (from different kingdoms), including CoVs, showed a diverse adaptability pattern. SARS-CoV-2 and SARS-CoV exhibit relatively lower but similar codon adaptability compared to MERS-CoV.


Subject(s)
COVID-19/genetics , Codon Usage/genetics , Evolution, Molecular , SARS-CoV-2/genetics , Base Composition/genetics , COVID-19/virology , Codon/genetics , Computational Biology , Genome, Viral/genetics , Humans , Nucleotides/genetics , Pandemics , SARS-CoV-2/pathogenicity
8.
Sci Adv ; 6(25): eabb5813, 2020 06.
Article in English | MEDLINE | ID: covidwho-619103

ABSTRACT

The COVID-19 outbreak has become a global health risk, and understanding the response of the host to the SARS-CoV-2 virus will help to combat the disease. RNA editing by host deaminases is an innate restriction process to counter virus infection, but it is not yet known whether this process operates against coronaviruses. Here, we analyze RNA sequences from bronchoalveolar lavage fluids obtained from coronavirus-infected patients. We identify nucleotide changes that may be signatures of RNA editing: adenosine-to-inosine changes from ADAR deaminases and cytosine-to-uracil changes from APOBEC deaminases. Mutational analysis of genomes from different strains of Coronaviridae from human hosts reveals mutational patterns consistent with those observed in the transcriptomic data. However, the reduced ADAR signature in these data raises the possibility that ADARs might be more effective than APOBECs in restricting viral propagation. Our results thus suggest that both APOBECs and ADARs are involved in coronavirus genome editing, a process that may shape the fate of both virus and patient.


Subject(s)
Betacoronavirus/genetics , Betacoronavirus/metabolism , Coronavirus Infections/genetics , Host-Pathogen Interactions/genetics , Pneumonia, Viral/genetics , RNA Editing/genetics , Transcriptome , APOBEC Deaminases/genetics , APOBEC Deaminases/metabolism , Adenosine Deaminase/genetics , Adenosine Deaminase/metabolism , Base Sequence/genetics , Bronchoalveolar Lavage Fluid/virology , COVID-19 , Coronavirus Infections/virology , Genome, Viral/genetics , Humans , Mutation Rate , Nucleotides/genetics , Nucleotides/metabolism , Pandemics , Pneumonia, Viral/virology , RNA, Viral/genetics , SARS-CoV-2 , Virus Replication/genetics
SELECTION OF CITATIONS
SEARCH DETAIL